Selecting Representative Data Sets

نویسندگان

  • Tomas Borovicka
  • Marcel Jirina
  • Pavel Kordik
  • Marcel Jirina
چکیده

A training set is a special set of labeled data providing known information that is used in the supervised learning to build a classification or regression model. We can imagine each training instance as a feature vector together with an appropriate output value (label, class identifier). A supervised learning algorithm deduces a classification or regression function from the given training set. The deduced classification or regression function should predict an appropriate output value for any input vector. The goal of the training phase is to estimate parameters of a model to predict output values with a good predictive performance in real use of the model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selecting Representative Images from Larger Photo Sets

Selecting a single photograph to represent a set of photographs is a useful approach when creating interfaces to large photo collections. Unfortunately, many different methods for selecting photographs are used in practice today, and none have been well studied or shown to be particularly effective. In this paper we look at several different common methods for selecting a single photograph in o...

متن کامل

A Framework for Mining High Dimensional Data for Feature Subset Selection

Features are representative characteristics of data sets. Identifying such fetures in a high dimensional dataset play an important role in real world applications. Data mining is best used to determine important features. Selecting important features from a subject of identified features can help in making expert decisions. However, efficient identification of such feature subset and selection ...

متن کامل

Density-Based Multiscale Data Condensation

ÐA problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approa...

متن کامل

Shapes of Features and a Modified Measure for Linear Discriminant Analysis

In this paper, the problem of selecting most representative features among a feature set is considered. Two new feature selection algorithms are introduced and their performances are compared with some well-known feature selection algorithms. The algorithms are tested with the iris data set, three artificially generated data sets and a data set obtained from steel surfaces.

متن کامل

Selecting a representative training set for the classi®cation of demolition waste using remote NIR sensing

In the AUTOSORT project, the goal is the separation of demolition waste in three fractions: wood, plastics and stone. A remote near-infrared sensor measures reduced re ̄ectance spectra (mini-spectra) of objects. Linear discriminant analysis (LDA) is used for the classi®cation of these spectra. To obtain the LDA model, a representative training set is needed. New LDA-models will be regularly need...

متن کامل

Summarizing Sets of Categorical Sequences - Selecting and Visualizing Representative Sequences

This paper is concerned with the summarization of a set of categorical sequence data. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighborhood. The goal is to yield a representative set that exhibits the key f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014